more error handling for compute loop #1152

alexcos20 · 2026-01-15T09:15:43Z

Changes proposed in this PR:

more error handling in compute loop
use setTimeout/clearTimeout instead of setInterval/clearInterval

see #1122 for context

alexcos20 · 2026-01-15T09:17:11Z

/run-security-scan

alexcos20

AI automated code review (Gemini 3).

Overall risk: low

Summary:
This pull request significantly enhances the robustness and reliability of the C2D compute engine's internal loop. The core change involves transitioning from setInterval to a setTimeout-based recurring loop, which is a safer pattern for asynchronous tasks as it prevents overlapping executions. Additionally, the InternalLoop function now includes a try...catch...finally block, ensuring that the next iteration of the loop is always scheduled, even if errors occur during job retrieval or processing. This prevents the compute engine from silently failing or getting stuck. Error logging for the internal loop has also been improved.

Comments:
• [INFO][style] Changing setNewTimer from async to synchronous is appropriate here since it no longer awaits getComputeEnvironments() and directly calls setTimeout, which doesn't return a promise that needs awaiting.
• [INFO][performance] The change from setInterval to setTimeout is a crucial improvement for robust asynchronous loops. This pattern ensures that a new InternalLoop execution is only scheduled after the previous one has completed (or failed), preventing overlapping executions and resource contention, especially when job processing times can vary.
• [INFO][other] The check if (this.cronTimer) { return } ensures idempotency if setNewTimer is called externally while a timer is already active. Within InternalLoop, this.cronTimer is explicitly cleared before setNewTimer is called, so it will always schedule a new one there, which is the desired behavior.
• [WARNING][bug] The condition changed from (await this.getComputeEnvironments()).length > 0 to this.envs.length > 0. Please confirm that this.envs is always guaranteed to be accurately populated and up-to-date before setNewTimer is called. If getComputeEnvironments() performs any critical initialization or fetches the latest state, bypassing it could lead to issues where the engine operates with stale or incomplete environment data. If this.envs is just a cached value, this change is a performance optimization and is fine.
• [INFO][bug] The addition of the try...catch...finally block significantly improves the robustness of the InternalLoop. Now, even if an error occurs during fetching jobs or processing them, the engine will log the error and crucially, always reschedule the next loop execution in the finally block. This prevents the C2D engine from silently stopping due to an unhandled error.
• [INFO][style] Correctly adapting the timer clearing logic from clearInterval to clearTimeout is good. The if (this.cronTimer) check also handles cases where cronTimer might already be null without errors.

more error handling for compute loop

bdd46d2

alexcos20 marked this pull request as ready for review January 15, 2026 09:16

alexcos20 requested review from andreip136, bogdanfazakas, denisiuriet, giurgiur99 and ndrpp as code owners January 15, 2026 09:16

alexcos20 commented Jan 15, 2026

View reviewed changes

alexcos20 mentioned this pull request Jan 15, 2026

Fix loop #1122

Closed

giurgiur99 approved these changes Jan 15, 2026

View reviewed changes

alexcos20 merged commit 9cd4598 into main Jan 15, 2026
19 of 20 checks passed

alexcos20 deleted the bug/more_error_handling_compute_loop branch January 15, 2026 13:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

more error handling for compute loop #1152

more error handling for compute loop #1152

alexcos20 commented Jan 15, 2026 •

edited

Loading

Uh oh!

alexcos20 commented Jan 15, 2026

Uh oh!

alexcos20 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

more error handling for compute loop #1152

more error handling for compute loop #1152

Conversation

alexcos20 commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexcos20 commented Jan 15, 2026

Uh oh!

alexcos20 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

alexcos20 commented Jan 15, 2026 •

edited

Loading